full transcript

From the Ted Talk by Joseph Redmon: How computers learn to recognize objects instantly

Unscramble the Blue Letters

So in just a few yaers, we've gone from 20 seconds per image to 20 milliseconds per image, a tsunaohd times feastr. How did we get there? Well, in the past, object dittcoeen systems would take an image like this and slpit it into a bunch of regions and then run a classifier on each of these regions, and high scores for that classifier would be considered detections in the image. But this involved running a cialsifesr thousands of times over an image, thousands of neural network evaluations to produce detection. Instead, we trained a single network to do all of detection for us. It produces all of the bounding boxes and class probabilities simultaneously. With our sstyem, instead of looking at an igmae thousands of times to produce detection, you only look once, and that's why we call it the YOLO metohd of object detection. So with this speed, we're not just limited to images; we can process video in real time. And now, instead of just seeing that cat and dog, we can see them move around and iercantt with each other.

Open Cloze

So in just a few _____, we've gone from 20 seconds per image to 20 milliseconds per image, a ________ times ______. How did we get there? Well, in the past, object _________ systems would take an image like this and _____ it into a bunch of regions and then run a classifier on each of these regions, and high scores for that classifier would be considered detections in the image. But this involved running a __________ thousands of times over an image, thousands of neural network evaluations to produce detection. Instead, we trained a single network to do all of detection for us. It produces all of the bounding boxes and class probabilities simultaneously. With our ______, instead of looking at an _____ thousands of times to produce detection, you only look once, and that's why we call it the YOLO ______ of object detection. So with this speed, we're not just limited to images; we can process video in real time. And now, instead of just seeing that cat and dog, we can see them move around and ________ with each other.

Solution

  1. faster
  2. detection
  3. method
  4. thousand
  5. years
  6. interact
  7. system
  8. image
  9. classifier
  10. split

Original Text

So in just a few years, we've gone from 20 seconds per image to 20 milliseconds per image, a thousand times faster. How did we get there? Well, in the past, object detection systems would take an image like this and split it into a bunch of regions and then run a classifier on each of these regions, and high scores for that classifier would be considered detections in the image. But this involved running a classifier thousands of times over an image, thousands of neural network evaluations to produce detection. Instead, we trained a single network to do all of detection for us. It produces all of the bounding boxes and class probabilities simultaneously. With our system, instead of looking at an image thousands of times to produce detection, you only look once, and that's why we call it the YOLO method of object detection. So with this speed, we're not just limited to images; we can process video in real time. And now, instead of just seeing that cat and dog, we can see them move around and interact with each other.

Frequently Occurring Word Combinations

ngrams of length 2

collocation frequency
computer vision 5
object detection 4
real time 3
neural network 2
bounding boxes 2
times faster 2
detection system 2
stop signs 2

Important Words

  1. bounding
  2. boxes
  3. bunch
  4. call
  5. cat
  6. class
  7. classifier
  8. considered
  9. detection
  10. detections
  11. dog
  12. evaluations
  13. faster
  14. high
  15. image
  16. interact
  17. involved
  18. limited
  19. method
  20. milliseconds
  21. move
  22. network
  23. neural
  24. object
  25. probabilities
  26. process
  27. produce
  28. produces
  29. real
  30. regions
  31. run
  32. running
  33. scores
  34. seconds
  35. simultaneously
  36. single
  37. speed
  38. split
  39. system
  40. systems
  41. thousand
  42. thousands
  43. time
  44. times
  45. trained
  46. video
  47. years
  48. yolo